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Abstract. Gene assembly is an intricate biological process that has been 
studied formally and modeled through string and graph rewriting sys- 
tems. Recently, a restriction of the general (intramolecular) model, called 
simple gene assembly, has been introduced. This restriction has subse- 
quently been defined as a string rewriting system. We show that by 
extending the notion of overlap graph it is possible to define a graph 
rewriting system for two of the three types of rules that make up sim- 
ple gene assembly. It turns out that this graph rewriting system is less 
involved than its corresponding string rewriting system. Finally, we give 
characterizations of the 'power' of both types of graph rewriting rules. 
Because of the equivalence of these string and graph rewriting systems, 
the given characterizations can be carried over to the string rewriting 
system. 



1 Introduction 

Gene assembly is a highly involved process occurring in one-cellular organisms 
called ciliates. Ciliates have two both functionally and physically different nuclei 
called the micronucleus and the macronucleus. Gene assembly occurs during sex- 
ual reproduction of ciliates, and transforms a micronucleus into a macronucleus. 
This process is highly parallel and involves a lot of splicing and recombination 
operations - this is true for the stichotrichs group of ciliates in particular. Dur- 
ing gene assembly, each gene is transformed from its micronuclear form to its 
macronuclear form. 

Gene assembly has been extensively studied formally, see [1]. The process 
has been modeled as either a string or a graph rewriting system [2,3]. Both 
systems are 'almost equivalent', and we refer to these as the general model. In 
[4] a restriction of this general model has been proposed. While this model is 
less powerful than the general model, it is powerful enough to allow each known 
gene [5] in its micronuclear form to be transformed into its macronuclear form. 
Moreover this model is less involved and therefore called the simple model. The 
simple model was first defined using signed permutations [4], and later proved 
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equivalent to a string rewriting system [6]. The graph rewriting system of the 
general model is based an overlap graphs. This system is an abstraction from 
the string rewriting system in the sense that certain local properties within the 
strings arc lost in the overlap graph. Therefore overlap graphs are not suited for 
the simple gene assembly model. In this paper we show that by naturally ex- 
tending the notion of overlap graph we can partially define simple gene assembly 
as a graph rewriting system. These extended overlap graphs form an abstraction 
of the string model, and is some way easier to deal with. This is illustrated by 
characterizing the power of two of the three types of recombination operations 
that make up simple gene assembly. While this characterization is based on ex- 
tended overlap graphs, due to its equivalence, it can be carried over to the string 
rewriting system for simple gene assembly. 



2 Background: Gene Assembly in Ciliates 

In this section we very briefly describe the process of gene assembly. For a de- 
tailed account of this process we refer to [1] . Gene assembly occurs in a group 
of one-cellular organisms called ciliates. A characterizing property of ciliates is 
that they have two both functionally and physically different nuclei called the 
micronucleus (MIC) and the macronucleus (MAC). Each gene occurs both in the 
MIC and in the MAC, however they occur in very different forms in the MIC 
and the MAC. The MIC form of a gene consists of a number of DNA segments 
Mi, . . . , M K , called MDSs, which occur in some fixed permutation on a chromo- 
some. The MDSs are separated by non-coding DNA segments. Moreover, each 
MDS can occur inverted, i.e. rotated 180 degrees. For example, the gene in MIC 
form encoding for the actin protein in ciliate sterkiella nova is given in Figure 1 
(see [7,5]). Notice that M 2 occurs inverted. 
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Fig. 1. The structure of the MIC gene encoding for the actin protein in sterkiella nova. 
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Fig. 2. The structure of a MAC gene consisting of n MDSs. 
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In the MAC form of the gene, the MDSs occur as a sequence Mi, • • • , M K 
where each two consecutive MDSs overlap, see Figure 2. These shaded area's 
represent the overlapping segments and are called pointers. Moreover, there are 
two sequences denoted by b and e, which occur on Mi and M K respectively, 
that indicate the beginning and ending of the gene. The sequences b and e are 
called markers. The process of gene assembly transforms the MIC into the MAC, 
thereby transforming each gene in the MIC form to the MAC form. Hence, for 
each gene the MDSs are 'sorted' and put in the right orientation (i.e., they do 
not occur inverted) . This links gene assembly to the well-known theory of sorting 
by reversal [8] . 

It is postulated that there are three types of recombination operations that 
cut-and-paste the DNA to transform the gene from the MIC form to the MAC 
form. These operations are defined on pointers, so one can abstract from the 
notion of MDSs by simply considering the MIC gene as a sequence of pointers 
and markers, see Figure 3 corresponding to the gene in MIC form of Figure 1 . 
The pointers are numbered according to the MDS they represent: the pointer 
on the left (right, resp.) of MDS Mj is denoted by i (i + 1, resp.). Pointers or 
markers that appear inverted are indicated by a bar: hence pointers 2 and 3 
corresponding to MDS M 2 appear inverted and are therefore denoted by 2 and 
3 respectively. In the general model the markers are irrelevant, so in that case 
only the sequence of pointers is used. 

4 4 |s| 1 6 1 \t\ |s| |fi| |7| 8 9 |e| |g| [j] \b\ g] | S | | g] 

Fig. 3. Sequence of pointers and markers representing the gene in MIC form. 



3 Legal Strings with Markers 

For an arbitrary finite alphabet A, we let A = {a | a £ A} with A n A — 0. We 
use the 'bar operator' to move from A to A and back from A to A. Hence, for 
p E Ali A, p — p. For a string u — xix 2 ■ ■ ■ x n with Xi £ A, the inverse of u is 
the string u = x n x n -\ • • ■ x\. We denote the empty string by A. 

We fix k > 2, and define the alphabet A = {2,3, ... ,k} and the alphabet 
77 = A U A. The elements of 77 are called pointers. For p £ 77, wc define ||p|| 
to be p if p £ A, and p if p £ A, i.e., ||p|| is the 'unbarred' variant of p. A legal 
string is a string u £ 77* such that for each p £ 77 that occurs in u, u contains 
exactly two occurrences from {p,p}. 

Let M = {b, e} with A n {b, e} = 0. The elements of M are called markers. 
We let E = AU{b, e}, and let I> = SUB. We define the morphism rm : if - * —> 77* 
as follows: rm(a) = a, for all a £ 77, and rm(m) = A, for all m £ M U M. We 
say that a string u £ \P* is an extended legal string if rm(tt) is a legal string and 
u has one occurrence from {b, b} and one occurrence from {e, e}. We fixm^lf 
and define for each q £ M U M, \\q\\ = m. 
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An extended legal string represents the sequence of pointers and markers 
of a gene in MIC form. Hence, the extended legal string corresponding to Fig- 
ure 3 is 34456756789e326289. The legal string corresponding to this figure is 
3445675678932289 (without the markers). Legal strings are considered in the 
general model since markers are irrelevant there. 

The domain of a string u G \P* is dom(w) = {||p|| | p occurs in u}. Note that 
m G dom(v) for each extended legal string v. Let q G dom(w) and let q\ and 
qi be the two occurrences of u with \\qi\\ = ||q 2 || = q- Then q is positive in u if 
exactly one of qi and qi is in E (the other is therefore in E). Otherwise, q is 
negative in u. 

Example 1. String u — 2464e2 is an extended legal string since rm(u) = 2442 is 
a legal string. The domain of u is dom(w) = {m, 2, 4}. Now, m and 2 are positive 
in u, and 4 is negative in u. 

Let u = x\Xi ■ ■ ■ x n be an (extended) legal string with Xi G E for 1 < i < n, 
and let p G dom(u). The p-interval of u is the substring XiXi+i ■ ■ -Xj where i 
and j with i < j are such that ||xj|| = \\xj \\ = p. 

Next we consider graphs. A signed graph is a graph G = (V, E, a), where V 
is a finite set of vertices, E C {{x,y} x, y G V, x ^ y} is a set of (undirected) 
edges, and a : V — > {+,— } is a signing, and for a vertex v G V, o~(v) is the 
sign of v. We say that v is negative in G if u{v) = — , and v is positive in G if 
<t(w) = +. A signed directed graph is a graph G = (V,E,a), where the set of 
edges are directed E C V x V . For e = (v±, V2) G E, we call v\ and U2 endpoints 
of e. Also, e is an edge from v\ to v 2 - 

4 Simple and General String Pointer Rules 

Gene Assembly has been modeled using three types of string rewriting rules on 
legal strings. These types of rules correspond to the types of recombination oper- 
ations that perform gene assembly. We will recall the string rewriting rules now 
- together they form the string pointer reduction system, see [2, 1]. The string 
pointer reduction system consists of three types of reduction rules operating on 
legal strings. For all p,q G 77 with ||p|| ^ \\q\\: 

— the string negative rule for p is defined by snr p (uippu2) — U1U2, 

— the string positive rule for p is defined by spr p (uipu2pU3) = U1U2U3, 

— the string double rule for p,q is defined by sdr P:q (uipU2qu3pu4qu 5 ) = 

where u\, U2, ■ ■ ■ , U5 are arbitrary (possibly empty) strings over 77. 

We now recall a restriction to the above defined model. The motivation for 
this restricted model is that it is less involved but still general enough to allow for 
the successful assembling of all known experimental obtained micronuclear genes 
[5] . The restricted model, called simple gene assembly, was originally defined on 
signed permutations, see [4, 9] . The model can also be defined as string rewriting 
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rules (in an equivalent way) as done for the general model above. This model is 
defined and proven equivalent in [6], and we recall it here. It turns out that it is 
necessary to use extended legal strings adding symbols b and e to legal strings. 

The simple string pointer reduction system consists of three types of reduc- 
tion rules operating on extended legal strings. For all p, q 6 77 with ||p|| ^ \\q\\: 

— the string negative rule for p is defined by snr p (uippu 2 ) = u\u 2 as before, 

— the simple string positive rule for p is defined by sspr p (uipu 2 pus) = u\u 2 uz, 
where \u 2 \ = 1, and 

— the simple string double rule for p,q is defined by ssdr pq (uipqu 2 pqu 3 ) = 

where u\, u 2 , and u 3 are arbitrary (possibly empty) strings over & . Note that 
the string negative rule is not changed, and that the simple version of the string 
positive rule requires \u 2 \ = 1, while the simple version of the string double rule 
requires u 2 = m 4 = A (in the string double rule definition). 

Example 2. Let u = 52445362663e be an extended legal string. Then within the 
simple string pointer reduction system only snr4 and ssprg arc applicable to 
u. We have ssprg (u) = 524453263e. Within the string pointer reduction system 
also spr 5 and spr 2 are applicable to u. We will use u (in addition to a extended 
legal string v, which is defined later) as a running example. 

A composition tp = p n ■ ■ ■ p 2 p\ of string pointer rules pi is a reduction of 
(extended) legal string u, if (f is applicable to (i.e., defined on) u. A reduction 
f of legal string u is successful if ip(u) = A, and a reduction ip of extended legal 
string u is successful if ip(u) G {fee, eb, eb, be}. A successful reduction corresponds 
to the transformation using recombination operations of a gene in MIC form to 
MAC form. It turns out that not every extended legal string has a successful 
reduction using only simple rules - take e.g. 234234. 

Example 3. In our running example, (p = ssprg sspr 2 sspr 5 snr4 ssprg is a 
successful reduction of u, since <p(u) — be. All rules in ip are simple. 

5 Extended Overlap Graph 

The general string pointer reduction system has been made more abstract by 
replacing legal strings by so-called overlap graphs, and replacing string rewriting 
rules by graph rewriting rules. The obtained model is called the graph pointer 
reduction system. Unfortunately, this model is not fully equivalent to the string 
pointer reduction system since the string negative rule is not faithfully simulated. 
Also, overlap graphs are not suited for a graph model for simple gene assembly. 
We propose an extension to overlap graphs that allows one to faithfully model 
the string negative rule and the simple string positive rule using graphs and 
graph rewriting rules. First we recall the definition of overlap graph. 
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Definition 4. The overlap graph for (extended) legal string u is the signed 
graph (V, E, a), where V — dom(u) and for all p,g £ dom(u), {p, q} 6 E iff 
q e dom(p') and p <G dom(g') where p' resp.) is the p-intcrval ((/-interval) of 
u. Finally, for p e dom(u), cr(p) = + iff p is positive in u. 




Fig. 4. The overlap graph of u from Example 5. 



Example 5. Consider again extended legal string u — 52445362663e. Then the 
overlap graph Q u of u is given in Figure 4. 

We say that p, q <G dom(u) overlap if there is an edge between p and q in the 
overlap graph of u. We now define the extended overlap graph. 

Definition 6. The extended overlap graph for (extended) legal string u is the 
signed directed graph (V, E, a), denoted by Q u , where V — dom(w) and for all 
p, q G dom(u), there is an edge (q,p) iff q or q occurs in the p-interval of u. 
Finally, for p € dom(u), cr(p) = + iff p is positive in u. 

Notice first that between any two (different) vertices p and q we can have 
the following possibilities: 

1. There is no edge between them. This corresponds to u = U\pu2pusqu4qu5 
or u = Uiqu2qu3pu4pii5 for some (possibly empty) strings u\,...,u^ and 
possibly inversions of the occurrences of p and q in u. 

2. There are exactly two edges between them, which are in opposite direction. 
This corresponds to the case where p and q overlap in u. 

3. There is exactly one edge between them. If there is an edge from p to q, then 
this corresponds to the case where u — Uiqu2pu^pu4 : qu^ for some (possibly 
empty) strings u\, . . . , u 5 and possibly inversions of the occurrences of p and 
q in u. 

As usual, we represent two directed edges in opposite direction (correspond- 
ing to case number two above) by one undirected edge. In the remaining we will 
use this notation and consider the extended overlap graph as having two sets 
of edges: undirected edges and directed edges. In general, we will call graphs 
with a special vertex m and having both undirected edges and directed edges 
simple marked graphs. 



6 




Fig. 5. The extended overlap graph of u from Example 7. 




Fig. 6. The extended overlap graph of v from Example 7. 



Example 7. Consider again extended legal string u — 52445362663e. Then the 
extended overlap graph Q u of u is given in Figure 5. Also, the extended overlap 
graph of v = 42324e36 is given in Figure 6. 

The undirected graph obtained by removing the directed edges is denoted 
by [Gu]- This is the 'classical' overlap graph of u, cf. Figures 4 and Figure 5. On 
the other hand, the directed graph obtained by removing the undirected edges 
is denoted by [[Q u ]]- This graph represents the proper nesting of the p-intervals 
in the legal string. 



6 Simple Graph Rules 

We will now define two types of rules for simple marked graphs 7. Each of 
these rules transform simple marked graph of a certain form into another simple 
marked graph. We will subsequently show that in case 7 is the extended overlap 
of a legal strings, then these rules faithfully simulate the effect of the snr and 
sspr rules on the underlying legal string. 

Definition 8. Let 7 be a simple marked graph. Let p be any vertex of 7 not 
equal to m. 

— The graph negative rule for p, denoted by gnr p , is applicable to 7 if p is 
negative, there is no undirected edge e with p as an endpoint, and there is 
no directed edge from a vertex to p in 7. The result is the simple marked 
graph gnr p (7) obtained from 7 by removing vertex p and removing all edges 
connected to p. The set of all graph negative rules is denoted by Gnr. 

— The simple graph positive rule for p, denoted by sgpr p , is applicable if p is 
positive, there is exactly one undirected edge e with p as an endpoint, and 
there is no directed edge from a vertex to p in 7. The result is the simple 
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marked graph sgpr p (7) obtained from 7 by removing vertex p, removing all 
edges connected to p, and nipping the sign of the other vertex q of e (i.e. 
changing the sign of q to + if it is — and to — if it is +). The set of all simple 
graph positive rules is denoted by sGpr. 

These rules are called simple graph pointer rules. 

Remark 9. The sgpr rule is much simpler than the gpr for 'classical' overlap 
graphs. One does not need to compute the 'local complement' of the set of 
adjacent vertices. Obviously, this is because the simple rule allows only a single 
pointer in the p-interval. □ 




Fig. 7. The simple marked graph gnr 4 (Q u ). 



Example 10. Rules gnr 4 and sgpr 6 are the only applicable rules on the simple 
marked graph 7 = Q u of Figure 5. Simple marked graph sgpr 6 (7) is depicted in 
Figure 7. 

Similar as for strings, a composition ip = p n ■ ■ ■ P2 p\ of graph pointer rules 
Pi is a reduction of simple marked graph 7, if <p is applicable to (i.e., defined 
on) 7. A reduction tp of 7 is successful if 99(7) is the graph having only vertex 
m where m is negative. For S C {Gnr, sGpr}, we say that 7 is successful in S if 
there is a successful reduction of 7 using only graph pointer rules from S. 

Example 11. In our running example, ip — sgpr 3 sgpr 2 sgpr 5 gnr 4 sgpr 6 is a 
successful reduction of Q u . 

We now show that these two types of rules faithfully simulate the string 
negative rule and the simple string positive rule. 

Lemma 12. Let u be a legal string and let p G II. Then snr p is applicable to u 
iff gnr || p || is applicable to Q u . In this case, Q SIirp{u) = &av\\ p \\ {Qu)- 

Proof. We have snr p is applicable to u iff u = u\ppui for some strings u\ and 
it2 iff ||p 1 1 is negative in u and the ||p|| -interval is empty iff ||p|| is negative in Q u 
and there is no undirected edge with ||p|| as endpoint and there is no directed 
edge to \\p\\ iff gnri| p |i is applicable to Q u . 

In this case, ^ snrp ( u ) is obtained from Q u by removing vertex ||p|| and the 
edges connected to ||p||, hence £ snrp ( u ) is equal to gnr|| p ||(^„). □ 
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Fig. 8. A commutative diagram illustrating Lemma 12. 



The previous lemma is illustrated as a commutative diagram in Figure 8. 
The next lemma shows that a similar diagram can be made for the simple string 
positive rule. 

Lemma 13. Let u be a legal string and let p € LJ . Then sspr p is applicable to 
u iff sgpi-jipn is applicable to Gu- In this case, G SBp r p (u) = s gP r j| P j| {Gu)- 

Proof. We have sspr p is applicable to u iff u = uipu 2 pu 3 for some strings u\, 
u 2 , and u 3 with \u 2 \ = 1 iff ||p|| is positive in u (or equivalently in Q u ) and there 
is exactly one undirected edge e with ||p|| as endpoint and there is no directed 
edge with ||p|| as endpoint iff sgprri p r| is applicable to Q u . 

In this case, G SS pr («) is obtained from Gu by removing vertex ||p||, removing 
all edges connected to \\p\\, and flipping the sign of the other vertex of e. Hence 
G ssp r p (u) is equal to S nr || P || (^w)- D 

Example 14- In our running example, one can easily verify that the extended 
overlap graph of sspr g (u) = 524453263e is equal to graph sgpr 6 (£ M ) given in 
Figure 7. 




Fig. 9. The extended overlap graph of w = b234234e. 



One may be wondering at this point why we have not defined the simple 
graph double rule. To this aim, consider extended legal string w = 6234234c 
Note that ssdr 2 ,3 and ssdr 3i 4 are applicable to w, but ssdr 2 ,4 is not applicable 
to w. However, this information is lost in Q w - applying the isomorphism that 
interchanges vertices 2 and 3 in Q w obtains us Gw again, see Figure 9. Thus, 
given only Gw it is impossible to deduce applicability of the simple graph double 
rule. 

To successfully define a simple graph double rule, one needs to retain infor- 
mation on which pointers are next to each other, and therefore different concepts 
are required. However, this concept would require that the linear representation 
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of the pointers in an extended legal string is retained. Hence, string representa- 
tions are more natural compared to graph representations. 

The next lemma shows that simple marked graphs that are extended overlap 
graphs are quite restricted in form. We will use this restriction in the next section. 

Lemma 15. Let u be a legal string. Then [[Gu]] is acyclic and transitively closed. 

Proof. There is a (directed) edge from p to q in [[Gu]] iff the p-interval is com- 
pletely contained in the g-interval of u. A nesting relation of intervals is acyclic 
and transitive. □ 

Remark 16. We have seen that [G u ] is the overlap graph of u. Not every graph 
is an overlap graph - a characterization of which graphs are overlap graphs are 
shown in [10]. Hence, both [[G u \] and [G u ] are restricted in form compared to 
graphs in general. □ 



7 Characterizing Successfulness 

In this section we characterize successfulness of simple marked graphs in S C 
{Gnr, sGpr}. First we consider the case S — {Gnr}. 

Remark 17. In the general (not simple) model, which has different graph pointer 
rules and is based on overlap graphs, successfulness in S has been characterized 
for those S which includes the graph negative rules (note that these rules are 
different from the graph negative rules defined here) - the cases where S does 
not contain the graph negative rules remain open. □ 

Theorem 18. Let 7 be a simple marked graph. Then 7 is successful in {Gnr} 
iff each vertex of 7 is negative, 7 has no undirected edges, and 7 is acyclic. 

Proof. Since [[7]] = 7 is acyclic, there is a linear ordering {p\,p2, ■ ■ ■ ,Pn) of the 
vertices of 7 such that if there is an edge from pi to pj, then i < j. The result 
now follows by the definition of gnr. In this case, linear ordering (pi,p2, ■ ■ ■ ,Pn) 
corresponds to a successful reduction tp = gnr pn _ i • • • gnr p2 gnr pi of 7. □ 

Using Lemma 15, more can be said if 7 = Gu for some legal string u. 

Corollary 19. Let 7 = Gu for some legal string u. Then 7 is successful in {Gnr} 
iff each vertex of 7 is negative and 7 has no undirected edges. In this case, 7 
is the transitive closure of a forest, where edges in the forest are directed from 
children to their parents. 

Next we turn to the case S — {sGpr}. 

Theorem 20. Let 7 be a simple marked graph. Then 7 is successful in {sGpr} 
iff the following conditions hold: 

1. [7] is a (undirected) tree, 
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2. for each vertex v of [j], the degree of v is even iff v is negative in 7, and 

3. the graph obtained by replacing each undirected edge in 7 by a directed edge 
from the child to the parent in tree [7] with root m is acyclic. 

Proof. Proof sketch. It can be verified that each of both statements hold iff there 
is an linear ordering (pi,p 2 , ■ ■ ■ ,p n ) of the vertices of 7 such that p n = m, and 
for each p t with i £ {1, . . . , n} the following holds: 

1 . the number of undirected edges from vertices pj with j < i to pi is even iff 
Pi is positive in 7, 

2. if i < n, then there is exactly one undirected edge between pi and another 
vertex pj with j > i, and 

3. there is no directed edge from a vertex pj to pi with j > i. 

In this case, linear ordering (j>\,P2, ■ ■ ■ ,P n ) corresponds to a successful reduction 
if = sgpr pn _ t • • • sgpr p2 sgpr pi of 7. □ 

Example 21. Consider again extended legal string u of Example 7 with its ex- 
tended overlap graph Q u given in Figure 5. Then by Theorem 20, Q u is not 
successful in {sGpr}, since condition 1 is violated - [7] is not a tree as it has two 
connected components. 

Reconsider now extended legal string v of Example 7 with its extended over- 
lap graph Q v given in Figure 6. By Theorem 20, Q v is successful in {sGpr}. 
According to the proof of Theorem 20, (2, 4, 3, m) is a linear ordering of the 
vertices corresponding to a successful (graph) reduction ip = sgpr 3 sgpr 4 sgpr 2 
of Q v . By Lemma 13, this in turn corresponds to a successful (string) reduction 
ip' of v - one can verify that we can take ip' = ssprg sspr| sspr 2 . Moreover, 
by the proof of Theorem 20, linear ordering (4, 2, 3, m) does not correspond to 
a successful reduction of Q v (or of v). 

Finally, we consider the case S = {Gnr, sGpr}. 

Theorem 22. Let 7 be a simple marked graph. Then 7 is successful in {Gnr, sGpr} 
iff the all of the conditions of Theorem 20 hold, except that in condition 1 ) [7] 
is a forest instead of a tree, and in condition 3) for each tree in the forest we 
can identify a root, where m is one such root, such that the graph obtained by 
replacing each undirected edge e in 7 by a directed edge from the child to the 
parent in the tree to which e belongs, is acyclic. 

Proof. Proof sketch. It can be verified that each of both statements hold iff 
there is an ordering (pi,P2, ■ ■ ■ ,Pn) of the vertices of 7 such that for each pi with 
i £ {1, . . . ,n}, condition 1) holds and either conditions 2) and 3) hold in the 
proof of Theorem 20 or there is no edge (directed or not) between a vertex pj 
to Pi with j > i. 

Again, in this case, linear ordering (pi,P2, ■ ■ ■ ,p n ) corresponds to a successful 
reduction <p of 7 where the vertices corresponding to roots in forest [7] (except 
m) are used in gnr rules, while the other vertices are used in sgpr rules. □ 
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Example 23. Consider again extended legal string u of Example 7 with its ex- 
tended overlap graph Q u given in Figure 5. By Theorem 22, Q u is successful in 
{Gnr, sGpr}. By the proof of Theorem 22, (6, 4, 5, 2, 3, m), (4, 6, 5, 2, 3, m), and 
(4, 5, 6, 2, 3, m) are the linear ordcrings of the vertices that correspond to success- 
ful reductions of Q u in {Gnr, sGpr}. Moreover, in each case vertex 4 corresponds 
to the gnr 4 rule while the other pointers correspond to sgpr rules. 

8 Discussion 

We have shown that we can partially model simple gene assembly based on 
a natural extension of the well-known concept of overlap graph. The model is 
partial in the sense that the simple double string rule does not have graph rule 
counterpart. Within this partial model we characterize which micronuclear genes 
can be successfully assembled using 1) only graph negative rules, 2) only simple 
graph positive rules, and 3) both of these types of rules. These results carry over 
to the corresponding simple string pointer rules. 

What remains is a graph rule counterpart of the simple double string rule. 
However such a counterpart would require different concepts since the overlap 
graph or any natural extension does not capture the requirement that pointers 
p and q (in the rule) are next to each other in the string. 
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